Skip to content

feat(fh-agent): site.spec.yaml → deployable edge-agent bundle compiler#3

Closed
r1marcus wants to merge 28 commits into
mainfrom
feat/fh-agent-cli
Closed

feat(fh-agent): site.spec.yaml → deployable edge-agent bundle compiler#3
r1marcus wants to merge 28 commits into
mainfrom
feat/fh-agent-cli

Conversation

@r1marcus

Copy link
Copy Markdown
Collaborator

Summary

Adds fh-agent, a Go CLI under go/cmd/fh-agent/ that compiles a
high-level site.spec.yaml into a Docker Compose bundle ready to
scp to a Pi5 / Jetson / NUC / STM32MP25 / Bosch Rexroth ctrlX CORE
and docker compose up.

Mental model: a compiler. Source = spec + hardware target;
target = OCI compose stack with the official engine image + a local
SLM sidecar + configs mounted as volumes.

Pipeline: `spec → plan → validate → build`. Each step is a separate
verb so an iterating agent can re-plan, re-validate, re-build
independently. Plan is byte-deterministic — same inputs ⇒ identical
artifacts.

Roadmap (shipped / open v1.x / open v2) lives at
`go/cmd/fh-agent/ROADMAP.md`.

What's in this PR

  • CLI (`go/cmd/fh-agent/`): 4 phases × ~10 subcommands, JSON-first,
    documented exit codes
  • 5 hardware target profiles as embedded YAML
  • Spec schema + JSON-Schema export for agent self-priming
  • Compose templates (engine + llama-server + optional mosquitto)
  • Cross-validation + subprocess call to `fh-builder validate --json`
    for full contract-schema checking (resilient — warn-fallback if
    `fh-builder` is missing)
  • TS-side: `fh-builder validate` gains `--json` flag emitting
    flat diagnostics array
  • Tests: determinism on all 5 targets, RAM-fit, mangle-catch

Test plan

  • `cd go && go test ./cmd/fh-agent/...` — all green locally
  • `fh-agent plan examples/building-automation.site.yaml --target rpi5-8gb --out build/`
  • `fh-agent validate build/` — returns `[]` exit 0
  • `fh-agent build build/ --name muellers-haus --tar` — produces `dist/muellers-haus/` + `.tar.gz`
  • Mangle the workflow → `fh-agent validate` returns structured JSON diagnostics, exit 1
  • TODO (v1.x): real-device smoke test — bring the bundle up on an actual Pi5 with a fake MQTT publisher

Out of scope (see ROADMAP)

  • Provisioning (`fh-agent deploy --to pi@host`) — v2
  • HIL test harness — v2 (skeleton ships in v1 bundles)
  • Bus types beyond mqtt/gpio/serial — v2

🤖 Generated with Claude Code

MarcusBot and others added 28 commits May 31, 2026 00:13
A Go CLI under go/cmd/fh-agent that turns a high-level site.spec.yaml
into a Docker Compose stack ready to scp + 'docker compose up' on a
Raspberry Pi 5, NVIDIA Jetson Orin Nano, x86 NUC, STM32MP25, or Bosch
Rexroth ctrlX CORE.

Pipeline: spec → plan → validate → build.

  spec     authoring & schema-export (read/write spec only)
  targets  introspection over 5 embedded hardware profiles
  models   suggest --target X --capability Y, RAM-filtered
  plan     deterministic compile of spec × target into 6 artifacts
           (workflow, mapping, resources, manifest, local-models, meta)
  validate cross-check the build dir (channels↔mapping↔resources↔manifest,
           model RAM vs target)
  build    render Compose stack (engine + llama-server + optional
           mosquitto) + README + .env.example, optional --tar

Designed for agents:
  - JSON-first I/O, identical diagnostic shape across commands
  - Documented exit codes (0/1/2/64)
  - Deterministic plan (sorted keys, hashed node ids) so an iterating
    agent sees only meaningful diffs
  - 'targets describe' and 'spec schema' double as documentation so
    Claude doesn't need to load 900 lines of contract YAML

Bundle uses the official ghcr.io engine image with configs mounted as
volumes — no per-customer Docker build, no AGPL-derivative-work question.

v1 limits documented in the README: no contract-schema validation
(separately run fh-workflow validate), no provisioning/OTA/HIL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fh-agent validate now shells out to `fh-builder validate <wf> --json`
to catch contract violations the Go-side cross-checks can't see (missing
required params, wrong arg names, expression shape). Diagnostics merge
into fh-agent's existing JSON array — same shape, same exit codes.

Resilient: if fh-builder is not in PATH (e.g. on a customer device
without Node), validate emits one warn diagnostic and continues.
`--skip-workflow-check` silences it explicitly.

Required TS-side change:
- ts/app/cli/validate.ts: new `--json` flag emits a flat diagnostics
  array on stdout (was human text only). Same field names as
  fh-agent's Diagnostic.

Plan fixes uncovered by the new check:
- Ticker: emit intervalValue + intervalUnit, not legacy intervalMs
- ReadPin/WritePin/OnPinEdge: use `pinReference`, not `channelReference`
- agentTask edges: now carry the required `prompt` Expression
- Expression literals: non-empty default per dataType ("0"/"false"/"")

Example bundle (muellers-haus) now passes both Go cross-checks AND
fh-builder contract validation end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scoped roadmap for go/cmd/fh-agent/ — principles, shipped versions
(v1.0 + v1.1), v1.x hardening backlog ordered by value/effort, v2
phase candidates, and explicit out-of-scope items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The entire file was commented out with only the package declaration
active. No code referenced RetryWithResilienceContext anywhere in the
tree. Drop the dead stub.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configure weekly Dependabot updates for the Go module under /go,
the npm workspace under /ts, and GitHub Actions workflows. Minor
and patch updates are grouped per ecosystem to reduce PR noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Enable GitHub Sponsors button for the ForestHubAI org. Other funding
platforms are listed commented-out as a reference for later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Route reviews per subtree to @ForestHubAI/maintainers:
- /go/, /ts/    — language subtrees
- /contract/    — critical: drift between go/ts implementations
- /.github/     — CI, workflows, repo meta
- *             — default fallback

Note: the @ForestHubAI/maintainers team must be created in
GitHub org settings (Org Settings → Teams) for the mentions to
resolve. Until then GitHub will show "Unknown owner" warnings on
the branch-protection UI but rules silently no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin per-language indentation and whitespace defaults so editors without
project-aware tooling agree with gofmt, Prettier, and YAML conventions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The auth middleware compared the configured shared secret against the
incoming Authorization header with `!=`, which short-circuits on the
first differing byte. A network-adjacent attacker could in principle use
response-time differences to learn the secret prefix-by-prefix.

Switch to `crypto/subtle.ConstantTimeCompare` after a length check
(length is not secret information, so an early length-mismatch return
does not weaken the guarantee). Add table-style tests covering the
empty-secret, missing-header, wrong-token and correct-token cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep-a-Changelog 1.1.0 format. Backfills go/v1.0.1 and ts/0.1.1
release lines from git history; opens an Unreleased section
covering the fh-agent CLI, dependabot, CODEOWNERS, and README
rewrite work landed since the tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lints contract/*.yaml on PR + push to main with stoplightio/spectral-action.
Baseline ruleset extends spectral:oas; first run surfaces 12 errors (mostly
no-$ref-siblings + missing paths on llmproxy/workflow) and 31 warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared-secret bearer middleware ran on every operation including
Healthz, breaking the standard container-orchestrator pattern where a
kubelet, Docker Compose healthcheck or ECS agent probes /healthz
unauthenticated. The probe would 401 and the orchestrator would mark
the engine unhealthy or refuse to route traffic to it.

Short-circuit the middleware when the strict-handler operationID is
"Healthz" so the probe reaches the handler without credentials. The
response only reports "is the runner attached" — no workflow content
or node state — so the exemption is safe to disclose. Other operations
(Deploy, Stop) continue to require the bearer.

Tests cover Healthz bypass with both a configured and an empty secret.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The vendored backend httpclient used &http.Client{} with no timeout.
Per-call context.WithTimeout is used on the boot/heartbeat paths
(BootCallbackTimeout=10s, HeartbeatTimeout=5s, ProviderLoadTimeout=10s),
but Engine.Deploy → memory.Restore → Snapshot inherits an uncapped ctx
from context.WithCancel(Background) and would hang on an unreachable
backend. RAG/LLM-chat call sites from runner nodes are likewise uncapped.

A 30s defaultTimeout on the http.Client gives defense-in-depth without
interfering with the tighter per-call timeouts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds .github/workflows/lint.yml with two parallel jobs:
- go-lint: golangci/golangci-lint-action@v6 against go/, Go version from go.mod
- ts-lint: npm ci && npm run lint in ts/ (eslint flat config already present)

Separate from ci.yml to keep lint-failures non-blocking for test runs and to
isolate concerns. permissions: contents: read only.

Also adds go/.golangci.yml with sensible defaults:
- enabled: govet, errcheck, staticcheck, ineffassign, unused, gosimple, gosec
- gosec noisy rules excluded (G104, G304, G404)
- govet: enable-all minus fieldalignment + shadow
- test files exempt from gosec/errcheck
- generated go/api/ tree excluded

Local verification:
- ts lint runs clean (0 findings) via `npm run lint`
- golangci-lint not installed locally; YAML syntax verified for both files
- Go lint findings: unknown until first CI run (follow-up: fix or relax)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Disables blank issues and routes off-topic intents (questions, security
reports, commercial license inquiries) to Discussions, private security
advisories, and mailto:root@foresthub.ai respectively.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gistry

Removes the GitHub Packages registry pin (which forced consumers to set up an
auth token even for read access) and points the package at the public npm
registry instead. Adds publishConfig.access=public so npm treats the scoped
@foresthubai/* package as openly published on first publish.

Tightens the files whitelist to dist + README + LICENSE; the previous list
included src, which double-shipped TypeScript sources alongside the compiled
dist/ output (npm pack now reports ~87 kB / 265 files for the tarball).

Follow-up: no release workflow exists today, but when one is added it must
not pin registry-url to https://npm.pkg.github.com.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… registry

Mirrors the workflow-core switch: drops the GitHub Packages registry pin so
consumers can install without an auth token, and sets access=public so the
scoped package is published openly.

Tightens the files whitelist: dist + tailwind-preset.ts + README + LICENSE
plus src/styles/ (load-bearing — the exports map points "./styles/index.css"
at "./src/styles/index.css", so consumers' CSS import would break if the
whole src/ tree were excluded). The rest of src/ (TS sources) no longer
ships. Tarball is now ~212 kB / 394 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nodes

Hero now leads with a concrete, verified number (~32 MB compressed image,
measured by cross-compiling cmd/engine for linux/arm64 and linux/amd64
with CGO_ENABLED=0 and -ldflags='-s -w' on top of distroless/static).
Replaces the previous ~15 MB figure which was not reproducible from a
fresh build. Drops OPC-UA from the pitch — it appears only as a
ctrlX-OS pass-through note in a target manifest, not as an engine node
(go/engine/driver and go/engine/transport implement GPIO/ADC/DAC/PWM,
UART/serial, and MQTT). Pi Zero 2W is replaced by generic "Raspberry
Pi" wording since no Pi Zero target is in cmd/fh-agent/targets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Publishes ghcr.io/foresthubai/edge-agents/engine on every `go/vX.Y.Z` tag —
the same tag that releases the Go module — so the README's `docker run ...`
quickstart actually resolves once the next tag is pushed.

- linux/amd64 + linux/arm64 via buildx + QEMU (cross-compile in builder
  stage, only the final COPY into distroless is emulated)
- docker/metadata-action strips the `go/` prefix so the image gets clean
  semver tags (1.2.3, 1.2, 1) plus `latest`
- keyless cosign signing via GitHub OIDC, signs each tag-by-digest
- SLSA build provenance + SBOM attestations pushed to the registry
- workflow_dispatch with optional tag override for one-off rebuilds

Triggered exclusively on `go/v*` tag-push and manual dispatch; no rolling
build on main, to keep `:latest` aligned with a real release tag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds four shields.io badges that work today: Go version (resolved from
go/go.mod, currently 1.25), GitHub stars, last commit, and open issues.
Each URL was probed (HTTP 200, real SVG content) before being added.

Skipped: release badge (no GitHub releases published yet — only the
go/v1.0.1 git tag exists), Docker Pulls (no public image), Discord
(no server), Codecov (not wired).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the publishConfig switch on both TS packages: replaces the GitHub
Packages walkthrough (consumer .npmrc + FH_PACKAGES_TOKEN) with the npmjs.org
flow (npm login on the publish machine, no consumer setup required). Drops
the "going public" footnote since that is now the steady state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs the GitHub Advanced Security CodeQL analyzer on push to main, on every
PR to main, and on a weekly Sunday schedule (catches CVEs disclosed against
already-merged code). Matrix covers go (autobuild via go/go.mod) and
javascript-typescript (the modern combined identifier; supersedes the split
javascript + typescript languages). security-and-quality query pack runs the
security suite plus the quality rules — slightly noisier than security-only
but worth it for a public repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New section frames edge-agents against three projects users frequently
evaluate alongside it (LangGraph, n8n, Dify). Comparison axes are
container size, offline mode, hardware I/O, MQTT, visual builder,
local SLMs, and license.

Container sizes are sourced: n8n ~368 MB and Dify ~900 MB from current
Docker Hub tags; edge-agents ~32 MB from the same cross-compile +
distroless/static measurement used in the hero. OpenClaw was considered
but dropped — no authoritative size data and the project is not
comparable in scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs gitleaks-action on every push and PR, with fetch-depth: 0 so the scan
covers full git history (not just the latest diff) — historic leaks block
the merge just like fresh ones.

No GITLEAKS_LICENSE configured: the paid tier is unnecessary for a public
repo. If the default ruleset throws false positives on third-party notices,
generated code, or vendored snippets we'll add a .gitleaks.toml allowlist
in a follow-up commit — the first real CI run on this branch will surface
what (if anything) needs ignoring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…N-Schema)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g block

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@r1marcus

Copy link
Copy Markdown
Collaborator Author

📝 Scope expansion (2026-05-31)

PR has grown beyond the original fh-agent CLI work. 25 additional commits added by @r1marcus on top of the original 3, covering OSS launch readiness. The fh-agent CLI portion is unchanged.

Added commit clusters

1. Engineering hygiene (4 commits)

  • fix(engine): constant-time bearer compare + /healthz unauthenticated
  • fix(httpclient): default 30s timeout
  • chore(llmproxy): remove dead resilience.go + leftover ResilienceConfig block

2. Community files (5 commits)

  • dependabot.yml, FUNDING.yml, CODEOWNERS, .editorconfig, ISSUE_TEMPLATE/config.yml

3. CI baseline (4 commits)

  • lint.yml (golangci-lint + ESLint), spectral.yml, codeql.yml, gitleaks.yml

4. Distribution + Release (4 commits)

  • release.yml: multi-arch ghcr.io + cosign keyless + SBOM + SLSA provenance
  • TS packages switched to npmjs.org public registry
  • RELEASING.md updated

5. README marketing (3 commits)

  • Hero rewrite (verified ~32 MB size), 4 status badges, ## Why edge-agents comparison vs LangGraph/n8n/Dify

6. CHANGELOG.md (Keep-a-Changelog, backfilled)

What's NOT in this PR

  • License split (Apache-2.0 for contract/ + ts/workflow-core/): separate stacked PR
  • README GIF: separate
  • Tier-1 features (HTTP-Request node, run persistence, OTel/metrics, streaming): Phase 3 follow-up

⚠️ Open design questions — please opine

  1. httpclient.go 30s timeout caps all backend calls. If streaming LLM-chat routes through backend, 30s may be too short. Confirm or raise to 60-120s? (go/engine/backend/internal/httpclient/httpclient.go:45)
  2. server.go healthz magic-string "Healthz" — silent-broken if engine.yaml operationId renamed. Want a sanity test pinning the constant?
  3. release.yml — double tag patterns (:v1.2.3 + :1.2.3), dead is_default_branch line, duplicate SBOM/provenance mechanisms — clean up or leave for dual verifier-compat?
  4. .golangci.yml excludes only api/. llmproxy/provider/mistral/types.gen.go is also oapi-codegen output — extend exclude-dirs?
  5. dependabot.yml uses single directory: /ts for npm workspaces — Dependabot only updates root lockfile. Per-workspace entries wanted?
  6. 5 CI workflows without paths: filters or concurrency: groups — every docs-only PR triggers all 5, old runs not cancelled. Optimize?
  7. Spectral first-run will be red — 12 errors + 31 warnings against spectral:oas ruleset. Acceptable baseline backlog, or tune the ruleset / continue-on-error first?
  8. ~32 MB hero figure is calculated (distroless + Go binary + -ldflags='-s -w'), not measured against a real GHCR-pushed image. Self-verifies once release.yml runs on first tag.

Local verification (all green)

  • go vet ./..., go build ./..., go test ./...
  • npm run typecheck, npm run build, npm run test

🤖 Authored with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants